Generic Entity Resolution in the SERF Project
نویسندگان
چکیده
The SERF project at Stanford deals with the Entity Resolution (ER) problem, in which records determined to represent the same real-life “entities” (such as people or products) are successively located and combined. The approach we pursue is “generic”, in the sense that the specific functions used to match and merge records are viewed as black boxes, which permits efficient, expressive and extensible ER solutions. This paper motivates and introduces the principles of generic ER, and gives an overview of the research directions we have been exploring in the SERF project over the past two years.
منابع مشابه
Developments in Generic Entity Resolution
Entity resolution (ER) is the problem of identifying which records in a database refer to the same entity. Although ER is a well-known problem, the rapid increase of data has made ER a challenging problem in many application areas ranging from resolving shopping items to counter-terrorism. The SERF project at Stanford focuses on providing scalable and accurate ER techniques that can be used acr...
متن کاملDevelopment of a Generic Risk Matrix to Manage Project Risks
A generic risk matrix is presented for use identifying and assessing project risks quickly and cost effectively. It assists project managers with few resources to perform project risk analysis. The generic risk matrix (GRM) contains a broad set of risks that are categorized and ranked according to their potential impact and probability of occurrence. The matrix assists PMs in quickly identifyin...
متن کاملThe Effect of Transitive Closure on the Calibration of Logistic Regression for Entity Resolution
This paper describes a series of experiments in using logistic regression machine learning as a method for entity resolution. From these experiments the authors concluded that when a supervised ML algorithm is trained to classify a pair of entity references as linked or not linked pair, the evaluation of the model’s performance should take into account the transitive closure of its pairwise lin...
متن کاملCorpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملStudy and Evaluation of Formulation and Nutritional values of parenteral Nutrition Formulae of Iranian Generic Project in Ten Major General Hospitals
Abstract: Parenteral Nutrition Formulae in Iranian Generic Project were studied in 31 patients hospitalized in ten major general hospitals in Tehran and Shiraz. Estimated theoretical energy (Calculated Energy Value or CEE) using Harris- Benedict equations and applying the Cerra stress factor was compared with the amount of energy estimated via indirect calorimetric technique (Measured Energy ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEEE Data Eng. Bull.
دوره 29 شماره
صفحات -
تاریخ انتشار 2006